180 ◾ Bioinformatics
5.3.7 Using EdgeR for Differential Analysis
EdgeR (Empirical Analysis of Digital Gene Expression Data in R) is an R Bioconductor
package for differential expression analysis of RNA-Seq data. It performs differential
expression of replicated count data using generalized linear model for the over-dispersed
count data and the models account for both biological and technical variability. EdgeR
uses negative binomial distribution to model the RNA-Seq count data.
Assume that for each sample i, the total number of reads (library size) is Ni,
g
φ is the
dispersion coefficient, and pgi is the relative abundance of gene g in the experimental group
i. The mean and variance are estimated as follows:
N p
gi
i
gi
µ =
(5.23)
gi
gi
g
µ
µ φ
(
)
=
+
variance
1
(5.24)
For differential expression analysis using the negative binomial regression, the parameters
of interest are the relative abundance of each gene (pgi).
As we have discussed above, the negative binomial distribution changes to the Poisson
distribution when the count data is not dispersed ( g
φ = 0) or to the quasi-Poisson distribu-
tion if the variance is linearly correlated to the mean. EdgeR estimates the dispersion ( g
φ )
as the coefficient of variation (CV) of biological variation between the samples. Dispersion
means biological coefficient of variation (BCV) squared that is estimated by dividing
Formula (5.24) by
gi
µ2.
CV
gi
g
µ
φ
=
+
1/
2
(5.25)
EdgeR calculates the common dispersion for all genes and it can also calculate gene-wise
dispersions and then it shrinks them toward a consensus value. Differential expression is
then assessed for each gene using an exact test for over-dispersed data [33].
In the following, we will analyze the non-normalized count data obtained by HTSeq-
count program in the previous step and saved as “htcount.txt” file in the “features” direc-
tory. The analysis will be carried out in R. Therefore, R must be installed on your computer.
The instructions of R installation are available at “https://cran.r-project.org/”. You can also
use R on Anaconda as well. We assume that you have R installed on your computer and it
is running. On R, you will also need to install Limma and EdgeR Bioconductor packages
by following the installation instructions available at “https://bioconductor.org/packages/
release/bioc/html/edgeR.html” to install EdgeR and “https://bioconductor.org/packages/
release/bioc/html/limma.html” to install limma. For the current versions, open R, and on
the R shell, run the following:
if (!require(“BiocManager”, quietly = TRUE))
install.packages(“BiocManager”)
BiocManager::install(“edgeR”)
BiocManager::install(“limma”)